Structure and Time-Evolution of an Internet Dating Community 
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We present statistics for the structure and time-evolution of a network constructed from user activity 
in an Internet community. The vastness and precise time resolution of an Internet community offers 
unique possibilities to monitor social network formation and dynamics. Time evolution of well-known 
quantities, such as clustering, mixing (degree-degree correlations), average geodesic length, degree, 
and reciprocity is studied. In contrast to earlier analyses of scientific collaboration networks, mixing by 
degree between vertices is found to be disassortative. Furthermore, both the evolutionary trajectories 
of the average geodesic length and of the clustering coefficients are found to have minima. 
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I. INTRODUCTION 

With the growing interest in social network analy- 
sis from the physics community, a new research area is 
emerging in the intersection between statistical physics 
and sociology (Albert and Barabasi 2002; Dorogovtsev 
and Mendes 2002; Newman 2003). Sociologists have 
been interested in network analysis for at least half a 
century and with mathematicians and statisticians they 
have developed a set of tools to analyze positions, struc- 
tures, and processes of social networks (Wasserman and 
Faust 1994; Butts 2001). Although there are exceptions 
(Fararo and Sunshine 1964; Skvoretz 1990), most socio- 
logical and anthropological studies of networks have fo- 
cused on small-group interaction or cognitive networks. 
In one respect this is quite natural as most groups and 
formal organizations are of small size. Also, a prag- 
matic reason for this is that data collection of large social 
networks, behavioral or cognitive, is cumbersome and 
often practically impossible to carry through. Therefore, 
although recent analyses (Watts and Strogatz 1998; Watts 
1999; Newman 2001) have brought new attention to com- 
parative analysis of large-scale social networks, the sta- 
tistical physics method, emphasizing the limit of large 
system sizes (Albert and Barabasi 2002), has been of lim- 
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ited utility. However, the extended use of database tech- 
nology provide new possibilities for constructing real 
world networks for the analysis of e.g. movie-actor net- 
works (Watts and Strogatz 1998) and co-authorship in 
science (Newman 2001). Surely these networks reflect 
social interaction, but they are also heavily constrained 
by the logic of a particular industry or a particular pro- 
fessional activity Thus, to allow for exploration of the 
possible universal properties of social networks in gen- 
eral, there is still an urgent need to analyze other types of 
large empirical social networks. In this paper we report 
on an investigation of a large social network, aiming to 
give a phenomenological description that will hopefully 
shed some new light on the processes forming the struc- 
ture of social networks. To put results in context, we 
try to compare our findings to other studies whenever 
possible, and to contrast parameters to what would be 
expected from a random network with similar charac- 
teristics. 

To construct network data and large graphs based on 
more spontaneous patterns of human interaction than 
e.g. co-authorship and co-actorship, one can consider 
data from e-mail exchange (Ebel, Mielsch et al. 2002) or 
user activity in Internet communities (Rothaermel and 
Sugiyama 2001; Smith 2002). The present work belongs 
to the latter category, with a strong focus on the dy- 
namics of the network. In contrast to previous studies 
of Internet communities (Smith 2002), we use down-to- 
the-second timing of the communication to investigate 
time evolution and obtain steady state estimates of well- 
known measures of graph structure. We use data from 
a Swedish Internet community called pussokram.com 
(roughly "kiss'n'hug" in English) that is primarily tar- 
geted at adolescents and young adults. The community 
provides an arena for flirting, dating, and other roman- 
tic communication; as well as communication for non- 
romantic friendship. 

Studies suggest that online interaction is driven by the 
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same needs as face-to-face interaction, and should not be 
regarded as a separate arena but as an integrated part of 
modern social life (Wellman and Haythornthwaite 2002). 
Thus communicative actions taken by members of the 
community can be expected to share many features with 
the web of human acquaintances and romances in the so- 
cial off-line world. Indeed, for many people in contem- 
porary Western societies, interaction on the Internet is 
as real as any other interaction (Wellman 2001). Internet 
communities are interesting by and for themselves, but 
this suggests that the formation and dynamics of social 
networks in an Internet community can share the same 
generic properties as all social acquaintance networks, 
and that the study of Internet communities can provide 
important information for enhancing our understanding 
of social networks in general. 

The paper is divided into four sections. In the next 
section we give a detailed description of the functions of 
the Internet community in focus. The third section con- 
tains statistical analyses and presentation of results that 
we summarize and discuss in the fourth and concluding 
section. 



II. THE INTERNET COMMUNITY PUSSOKRAM.COM 

Pussokram.com is a Swedish Internet community pri- 
marily intended for romantic communication and tar- 
geted at adolescents and young adults. The community 
had around 30000 active users during the spring and 
summer 2002, the mean user age is 21 years, and ap- 
proximately 70 percent of the users are women (there- 
fore, and to simplify, we will use the female gender when 
referring to users in this paper). Both age and sex are 
self reported. It is possible to have multiple accounts 
on the community. A crude check on the number of ac- 
counts linked to every unique e-mail address indicates 
that this is not very common (more than 99.7% of the 
membership accounts are associated with a unique e- 
mail address and no e-mail address are associated with 
more than 5 accounts). 1 Our data consists of all the user 
activities on pussokram.com logged for 512 days from 
13:39:25 on February 13, 2001 (f = 0) to 13:28:19 on July 
10, 2002. The smallest time-unit on the log is 1 second. 
We analyze the activity of all users registered at time 
t = 0, as well as the activity of any new users during this 
time span. 2 Time t = defines the start up day for this 
particular community. However prior to t = there was 



1 Of course it is possible to use an unique e-mail address for every 
unique e-mail account but since this information is not revealed its 
hard to see way on would go through the extra effort so doing. 

2 Personal integrity is of course an issue here. For the analysis, we 
study the anonymized data to prevent any intrusion of privacy, and 
we do not have access to specific message contents. Like everyone 
else, we can read the guest books, but still we cannot link an user 
(and her guest book) to the vertices of the network. Thus, we cannot 
identify any specific individual person in the data. We do not even 



a mail server for sending anonymous love messages on 
the Internet. Registered users of this service had their 
accounts automatically transferred to pussokram.com. 
We only study activity on the community, nevertheless 
this recruitment might induce higher initial growth of 
active users. 

Pussokram.com has a pronounced romantic profile, 
where: 

• Users are encouraged to send messages to others 
that they are secretly in love with. 

• The provider answers questions related to love and 
sex posed by the users under the pseudonym Dr. 
Love. 

• The design of the HTML-pages makes use of a 
romantic iconography well known to the targeted 
users (with Valentine's hearts, deep red colors, etc., 
see Fig. Q}. Nevertheless, a quick glance through 
some of the public guest books reveals that many 
of the contacts taken are also non-romantic. 



A. Types of contacts in pussokram.com 

There are four major modes of communication at pus- 
sokram.com. We study each of the networks generated 
by these four types of contacts separately and we also 
study the union of these networks generated by any of 
these contacts. A brief description of the four types of 
contacts follows: 

• The Messages are in effect intra-community e- 
mails. These are private in the sense that no one in 
the community, except the sender and receiver, can 
access them. Not even information on how many 
messages other users have received are retrievable 
for other users. 

• In Guest book signing, each user has a guest book 
that every community member is free to write in. 

• Flirt or "friendship request:" User A can ask user 
B to be her friend. If user B accepts user A's request 
then they can both easily see if the other is online 
whenever they are logged onto pussokram.com. 
Information on the friends of a specific user is pri- 
vate to the user only. 

• Friendship: A friendship relation is established 
after acceptance of a friendship request, as de- 
scribed above. The friendship network is thus bi- 
directional. A friendship can be canceled by any 
of the friends. 



have data that can be cross-examined with other databases (like 
computer IP-addresses) to detect users identity 
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Hey you Friday, July 5, 2002 | Newest user: User B P1 6 



HllllllHIIIU'MIK^g-^rfV 



pussokram.com Use r A F20 

digitala relationer 



You have one new message 



all in one pi. 



Message box 



Secret corner 



Community 
Your homepag e 
Your guestbook 
Your diary 
Write in your diary 
Your friends 
Search member 



Member of the week 



Ask Dr. Love 



Cell phone 



Competitions 



About pussokram 



Log out 



ere User A has 
oace to write 



about herself . . . 




No picture 

uploaded 



auestbook 



dr. love's 



'ersonal information: 



E-mail: 
ICQ: 



someone@somewhere.se 



Fast facts: 



Member since: 

April 16, 2000 

Last time online: 

13:02, June 20 

Most recent visitors: 

User D P23 
User E P33 
User F P19 
UserG F12 
User H P30 



Here starts the text 
of a mail to 
"Dr. Love" . . . 
Read more . . . 

most recent diary 

UserC F18: 

Here goes the diary 

» More diaries 



recently visited 



Here you can see the 
five most recently 
visited homepages 



Residence: 
Economy: 
Movie taste: 
Hair color: 



Apartment 
Rich 
Horror 
Dyed 



Sense of humour: Normal ! 
Length: Taller than most I 

Music taste: Everything I 
Personality: Serious ( 

If you like User A you'd also like . . . 

User D F25 User H F25 

User E F21 User I F 25 

User F F28 User J F22 

ser G F20 User K F 



Civil status 
Favourite color: 
Likes: 
Hobbies: 
Style of clothes 
Favourite food: 
Eye color: 
Occupation: 



Single 
Red 

The World 

Watch TV 

Streetwise 

Various 

Blue 

Working 



User L F23 
User M F22 
User N F27 
erO F2 



[mi»llllEHMI»:l»:«li|l»li^«Mt:llire»lll^^^ 



FIG. 1 Screenshot of a typical user homepage at pussokram.com. "User A", "User B", etc. symbolize user names. (The translation 
is due to the authors. Italics denote a description rather than a translation.) 



B. Ways to receive attention and search users 

Unless engaged in peer-to-peer contact of some sort, 
users at pussokram.com are relatively anonymous to- 
wards each other. There is reason to believe that knowl- 
edge about the prior interactive behavior of other in- 
dividuals structures the present interactive behavior of 
a given individual (the so called imitation factor). The 
only information about a user's interaction history avail- 
able to other users. But there are several ways for an user 



to draw attention to herself (i.e. to direct other users to 
her community homepage), and for users to find infor- 
mation about others. Here we summarize various ways 
that can be used to receive attention, search for other 
users, and promote oneself at pussokram.com. The fol- 
lowing information is displayed when a logged on user 
browse the pussokram.com website: 

• The username of the most recently registered com- 
munity member. 
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• The name of the most recently edited diary (each 
user has space open for others to read, intended as 
a diary). 

• The names of the most recent users to browse a 
specific user's homepage. 

• The names of similar users are displayed on a 
specific users homepage. Similarity is assesses 
through self-reported background variables. 

• A long interview with the "user of the week" (al- 
though updated more seldom than weekly). This 
is an epithet that users can apply for. 

• Photographs of 10-20 users are displayed at the 
login-page. 

A user can search out other users with a search en- 
gine (the "sokofinder" — in English "search'n'finder" — 
in Fig. ^ that handles the following criteria: Sub-string 
of the username, gender, age, place of residence, online 
status, and if a user has provided a photograph of her- 
self. Presumably, these are the characteristics that drive 
user activity, but because it is hard to assess their va- 
lidity, and because we are only interested in structural 
properties, we do not conduct any analysis on them. 



we compare some observed quantities to the correspond- 
ing average values from randomized networks with the 
same degree-sequence as the original. By this approach, 
we examine how aspects of structures other than the de- 
gree sequence, influences the quantities. Every known 
real social network deviates from the average random- 
ized network in a larger or lesser extent, depending on 
the social forces structuring the interaction. For example, 
with regards to the present case, we believe that an In- 
ternet community network will be closer to the average 
randomized network than several other types of social 
networks, because time and space constraints are much 
less pressing than in, e.g., a kinship network. These ran- 
domized networks are generated by sequentially going 
through all directed arcs A-B, and for every such arc ran- 
domly select another arc, C-D, and then rewire so that 
A-D forms one arc, and C-B forms another. The choice 
of C-D is done with uniform randomness among all arcs 
that would not introduce a loop or a multiple arc. We 
use this algorithm to generate ~ 3000 networks and the 
quantities are averaged over these networks. This pro- 
cedure is inspired by Roberts (2000). However it differs 
from Roberts in the sense that we use sweeps over all 
arcs (where each arc is rewired at least once) as the unit 
of iterations of the algorithm. 3 



C. Comparisons with other empirical and statistical 
networks 

For comparison we also use networks by instant mes- 
saging at the French Internet community nioki.com and 
scientific collaboration (or, rather, co-authorship) net- 
works. nioki.com and pussokram.com are rather sim- 
ilar, both in terms of content and design, but com- 
pared to pussokram.com, nioki.com is even more youth 
oriented and not as focused on romantic relations as 
pussokram.com. Besides the possibility of searching 
for user names, nioki.com has two search procedures 
recherche Vamitie (search for friendship) and recherche 
V amour (search for love), where one can fill out ques- 
tionnaires to find other users that match ones prefer- 
ences. In the nioki.com network, an arc connects user 
A to user B if user B is in user A's list of contacts (for 
details see (Smith 2002). In the scientific collaboration 
networks (Newman 2001) the vertices are scientists who 
have uploaded manuscripts to the Los Alamos preprint 
repository arXiv.org, arcs are added between scientists 
who have co-authored a paper. In contrast to the pus- 
sokram.com and nioki.com networks, ties in the scien- 
tific collaboration network is bi-directional. Note, that 
the pussokram.com networks are dynamic, while we 
only have access to snapshot data of nioki.com and sci- 
entific collaboration networks. For this reason we can 
only make comparisons between the static properties of 
these networks. 

In addition, following (Anderson, Butts et al. 1999; Pat- 
tison, Wasserman et al. 2000; Shen-Orr, Milo et al. 2002), 



III. STATISTICAL ANALYSIS 

The pussokram.com network consists of all registered 
users and the communication flow between these users 
as described above. Communication is conceived of as 
directed links between users. This is translated into a 
graph of vertices (users) and arcs (ties). Vertices are 
added to the network the first time a registered user is 
active, i.e. the first time the user sends or receives a mes- 
sage, signs a guest book, or sends or accepts a friendship 
request as described above. Each of these interactions 
defines a unique network, and by adding an arc for any 
activity one gets a total network of online activities. We 
thus study five networks, and for each of them the ver- 
tex set is empty at t = 0. We represent the network 
as a directed graph, G = (V,A), where V is the vertex 
set and A is the set of arcs, or ordered pairs of vertices. 
N = | V | denotes the order (number of vertices) of G, and 
M = \A\ represents the number of arcs. Sometimes we 
study properties of the undirected graph obtained by 



3 To be precise our algorithm run as follows: We go sequentially 
through the arc set A (see Sect. lIIII . For every arc (v, w) we construct 
a set A' of arcs such that if a member (v',zv') of A' is to be rewired 
with (v, w) — i.e. so that (v, w) and (i/, w') are replaced by (v, w') and 
(v 1 , w) — then no loops or multiple arcs are formed. Then we choose 
one of A's arcs with uniform randomness and rewire that arc with 
(v,w). 
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FIG. 2 Time evolution of the number of vertices (a) and average 
degree (b) as a function of time. 
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FIG. 3 Reciprocity R (a), and (b) assortative mixing coefficient 
rair as functions of time. 



taking the reflexive closure of G. 4 



degree) are reaching their constant limits. 



A. Decreasing growth rate of network size and 
convergence of average degree 

For each network, the number of vertices of each net- 
work, N, as a function of time during the sampling is 
displayed in Fig. Ela), and the average degree, i.e. the 
average number of arcs per vertex, M/N, is displayed in 
Fig|2jb). As can be seen, both the number of vertices and 
the average degree are increasing as a function of time, 
but with at a decreasing growth rate. The average degree 
appears to converge to a constant, but for f < 100, it in- 
creases as a power function. The more rapid growth rate 
in the beginning of the period is explained by the fact that 
old users log on for the first time during our sampling 
period (see discussion in Section The decreasing 
growth, and apparent approach to equilibrium, stand in 
contrast to the accelerated growth of the Internet and 
the World Wide Web (Dorogovtsev and Mendes 2002), 
as well the linear growth of scientific co-authorship net- 
works extracted from article databases (Newman 2001; 
Newman 2001; Barabasi, Jeong et al. 2002). However, in 
social networks, the average degree cannot be increas- 
ing without bounds, and this goes for scientific collab- 
oration networks too. We believe the difference stems 
from a wider effective sampling time frame — due to the 
much more rapid dynamics of an Internet community 
(compared to scientific collaborations) we are, relatively 
speaking, able to follow the process for a much longer 
period. In the sense that G is a steadily growing dynamic 
network, we deal with a non-equilibrium representation 
of the social situation. When we speak of the network 
"reaching equilibrium," we refer to when all quantities 
that are bounded as a function of N (such as the average 



B. Reciprocity varies between networks 

Various types of social relations differ in direction, in- 
tensity, and frequency (Granovetter 1973). Messages be- 
tween agents with different social status for example, 
tend to be unevenly distributed (Gould 2002). In the 
present analysis, we can investigate the reciprocity of 
communicative action by looking at the direction of the 
communication flow between any two users. For exam- 
ple, if user A sends a friendship request to user B, we 
observe a link between user A and user B, and note an 
arc between the two vertices. But it makes quite a dif- 
ference whether user B accepts the invitation or not, i.e. 
whether we note one or two arcs between the vertices. 
We define reciprocity R, as the fraction of mutual dyads, 
i.e. the ratio between the number of vertex-pairs [v, w) 
occur in two arcs {{v, w) and (w, v)) and vertex-pairs that 
occur in at least one arc. More analytically: 



R = 



2M 

M7 



(1) 



4 I.e. the graph obtained if for every (u, v) e A and (v, u) £ A then (v, u) 
is added to A. 



where M2 is the number of arcs in the reflexive closure 
of G. R lies strictly in the interval [0, 1]; if (u, v) is an arc 
then R = implies that (v, u) is not an arc and R = 1 
implies that (v, u) is an arc. 

The time evolution of the reciprocity can be seen in 
Fig- Eh- As is evident from the figure, reciprocity levels 
differ little between the different networks. By defini- 
tion, the friendship network has reciprocity of 1. And 
by the same token, the flirt network has a reciprocity 
equal to zero. For the other two networks, the curves 
converge to values around 0.4 for the guest book and 
messages networks, and 0.5 for the all contacts network 
(see Table UlLBl . It's hard to judge whether these are 
high or low values of reciprocity. They are however 
compatible with data for the French Internet community 
nioki.com. We normally assume acquaintance networks 
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TABLE I Assortative mixing coefficients, r, for five pussokram.com networks, and for nioki.com and arXiv.org networks. Statistics 
for corresponding randomized networks are within square brackets. Differences between the various mixing coefficients are 
discussed in the text. Double hyphens indicate missing data. Note: * p < 0.01 nioki.com and arXiv.org data are not tested for 
significance. 



network 




T 




7"in in 


^in out 


''out in 


''out out 


all contacts 




— U.U4o 


A ACQ* 


— U.uoo 


— U.U4o 


— U.U/ 1 


— U.UDU 






\— U.U4aJ 


r n n/ii l 


[— U.UZoJ 


r fi 071 1 


r n oaqi 

[— U.U47J 


r n rn^i 

[— U.UOOJ 


messages 




-v.vdd 


— U.Uoa 


U.UD4 


-U.U3D 


-U.U/ D 


-U.Uo/ 






[-0.053] 


[-0.061] 


[-0.013] 


[-0.011] 
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[-0.057] 


guest book 


20691 


-0.073* 


-0.085* 


-0.097* 


-0.043* 


-0.088* 


-0.053* 




[-0.049] 


[-0.038] 


[-0.024] 


[-0.015] 


[-0.042] 


[-0.026] 


friends 


14278 


-0.042* 
















[0.031] 












flirts 


8186 


-0.12* 


-0.12* 


-0.006 


-0.022 


-0.12* 


-0.042* 






[-0.12] 


[-0.10] 


[0.016] 


[-0.002] 


[-0.10] 


[-0.013] 


nioki.com 


50259 


-0.13 


-0.10 


-0.088 


-0.084 


-0.10 


-0.095 






[-0.034] 


[-0.014] 


[-0.018] 


[-0.014] 


[-0.020] 


[-0.016] 


arXiv.org 


52909 


0.36 














[-0.034] 













to have a high degree of reciprocity, but one reason to ex- 
pect a lower value for online interaction is that an actor 
feels less social pressure to respond to a communicative 
act over the Internet than in a face-to-face, or telephone 
encounter, for example. 



C. Disassortative mixing coefficients of the 
pussokram.com networks 

Together with the degree distribution, the degree- 
degree correlation is considered to govern much of the 
network's robustness towards disturbances as well as 
the information flow. In other contexts the discussion is 
usually phrased in terms of resilience against epidemics 
and attack. A positive degree-degree correlation is also 
referred to as assortative mixing by degree, and it means 
that vertices of high degree preferably attaches to each 
other, and vice versa. For example, assortative mixing 
makes the networks more vulnerable to outbreaks of 
diseases, and more robust against strategic attack (New- 
man 2002), because if people with many contacts are 
connected to other people with many contacts, the epi- 
demic threshold will be lowered. Disassortative mixing, 
on the other hand, gives rise to larger epidemics (Morris 
and Kretzschmar 1995). 

We measure assortative mixing by calculating Pear- 
son's correlation coefficient r for the degrees at either 
side of an edge as suggested by Newman (2002): 

(fct fcf rom ) — (fctoX^from) /r> N 

r = i , - (2) 

^(kl)-(ho)^(kl om )-(k hom )* 

In equationfJl <• • • ) denotes the average over arcs, k hom 
is some (in-, out-, or total) degree of the vertex that the 
arc starts from, and k t0 is some degree of the vertex that 



the arc leads to. We look at r for total degree of both bi- 
directional (where the reflexive closure has been taken 
if the network is not bi-directional by definition) and di- 
rected graphs fdir- Furthermore, we measure the four 
combinations of in- and out degree correlations; e.g. the 
out-in correlation coefficient indicates whether users that 
have many contacts (high out-degree) prefers to commu- 
nicate with those users that themselves receive commu- 
nication from many users (high in-degree). 

The values for pussokram.com and other networks 
are displayed in Table IIII.BI Interestingly enough all 
the pussokram.com networks, as well as the nioki.com 
network display a significant disassortative mixing for 
all types of degree-degree correlations. This is in con- 
trast to what have been measured for (scientific-, actor-, 
and business-) collaboration networks (Newman 2002). 
To set these results in perspective we also measure r 
for a scientific collaboration network, which clearly dis- 
plays a positive assortative mixing coefficient. Maybe 
an assortative mixing is significant only to interaction 
in competitive areas, such as professional collaborations 
(where only already big names are likely to be success- 
ful in collaborating with other big names). This result 
relates to research on exchange networks that claim that 
negative mixing is optimal wben actors are substitutable, 
as for example in friendship and dating network (Cook, 
Emerson et al. 1983). In contrasts, professional collabo- 
ration is positive because both knowledge and already 
established channels for cooperation screen off potential 
alternative collaborators. Another issue is the skewness 
of the degree distribution. Intuitively, a large spread 
in the degree distribution will increase the likelihood of 
observing negative mixing. And as can be seen from 
the randomized networks in Table IIII.BI given the de- 
gree distribution we would expect a negative mixing 
coefficient. However, the observed coefficients are con- 
sistently, and significantly, higher than expected. This 
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strongly suggests that negative mixing arise from this 
particular form of social interaction in which alters are 
substitutable (Cook, Emerson et al. 1983). Note though, 
that some network models, analyzing completely differ- 
ent forms of interaction, with skewed degree distribu- 
tions produce networks of zero or positive assortative 
mixing (Newman 2002; Park and Newman 2003). 

The six different assortative mixing coefficients of Ta- 
ble llll.Bl are all of the same sign and roughly of the same 
magnitude. This is interesting since it suggests that the 
r-values is a result of other structures (presumably the 
degree-sequence) rather than from the behavior of in- 
dividuals: There are no a priori reasons for rjn 0U t to be 
the same as e.g. ri n i n , as a large rj nout means that actors 
that are active in the community (have a high A: out ) tend 
to associate with those who are successful in promoting 
themselves in the community (have a high k[ n ), while a 
large r m i n means that the latter category has a preference 
towards each other. 

Fig.[3J} shows the time development of the assortative 
mixing coefficient (the time development of the other 
assortative mixing coefficients of Table ITlTTBl is qualita- 
tively similar). We see that r^ii converges more quickly 
than the average degree. This is not surprising since 
the correlation coefficient is a function of the way ties 
are formed rather than the size or average degree of the 
network. An interesting detail of Fig. |3p is the jump at 
t ~ 300 days in the flirt (friendship request) network. 
This is due to the formation of a tie between two of the 
most connected actors. (The fact that the flirt network is 
by far the sparsest strengthens this effect.) 



D. Cumulative degree distributions are highly skewed 

The degree distribution has received much attention 
in comparative analyses of complex networks since the 
work of Barabasi and Albert (1999). A skewed degree 
distribution is commonly regarded as a cumulative ef- 
fect in the attachment of new arcs to the network (Simon 
1955; Barabasi and Albert 1999), and it offers a way to 
classify different types of networks (Amaral, Scala et al. 
2000) . Indeed it has been demonstrated that many appar- 
ently dissimilar types of networks share the same highly 
skewed degree distributions of a (truncated) power-law 
form (Albert and Barabasi 2002), indicating an emerging 
scale-free structure. Such degree distributions are gen- 
erated through a growth process in which new arcs are 
drawn between already existing vertices and new ver- 
tices only. However, a process that reasonably describes 
the activity of an Internet community would allow also 
for new arcs to be drawn between two already existing 
vertices. Such a mixed process however, would result 
in a stretched exponential distribution, and not a power- 
law, and thus a stretched exponential distribution is what 
we would expect to observe. Another process that can 
be responsible for cutting the tails of power-law degree 
distributions in real-world networks is a limited capacity 



of the actors. 

Following (Liljeros, Edling et al. 2001) we measure the 
cumulative degree distribution of all the pussokram.com 
networks, see Fig. [I] If the degree distribution follows 
a power-law with exponent y then the cumulative dis- 
tribution will have the exponent a = y + 1. All pus- 
sokram.com networks are highly skewed, but none of 
them fits a power-law form across the whole range ob- 
served. However, it is interesting to note that there are no 
clear signs of the (inevitable) high-degree truncation in 
any of the graphs (Fig.|4j. A previous study of the French 
nioki.com has reported a power-law fit of the cumula- 
tive degree distribution (Smith 2002). Our result might 
appear to set the pussokram.com community apart from 
the nioki.com community, but a closer inspection of our 
graphs and (Smith 2002) reveals a striking similarity in 
the functional form of the distribution. We therefore con- 
clude that the dynamics shaping the degree-distribution 
is to a large extent the same for the two communities. 

E. Evolution of average geodesic length 

As a general measure of how closely connected a graph 
is, the average geodesic (shortest path) length is one of 
the most studied network quantities. There is no unique 
natural definition of average geodesic length in an ar- 
bitrary directed graph— the problem is the contribution 
from disconnected pairs of vertices. One choice is to 
measure the geodesic distance averaged over pairs of 
vertices in the giant component: 

'^ GCI (U,V)€A GC 

where d(u, v) is the distance between u and v, and Aqq 
is the arc-set of the giant component. Another option 
is to average the inverse geodesic length (Latora and 
Marchiori 2001), 

r 1 = - T — ^— (4) 

M t-i d(u,v) ' V ; 

(u,v)€A v ' 

where l/d(u,v) is defined as zero when no path exists 
from u to v. In the present paper we focus on Z -1 , and Zgc 
for the reflexive closure of G. If the two measures agree, 
we can infer that there is no additional effect influencing 
the shortest paths in a substantial way, other than the bi- 
directional structure of the largest connected subgraph. 

As time evolves there are two conflicting mechanisms 
governing the average geodesic length: The increasing 
number of vertices works for an increase of Z, whereas 
the increasing average degree makes I shorter. For the 
pussokram.com data the latter effect dominates, during 
the time span of our data set, to give a monotonously 
decreasing Zgc (monotonously increasing Z -1 ) as shown 
in Fig.[5| The same situation has been reported for scien- 
tific collaboration networks (Barabasi, Jeong et al. 2002). 
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F. Density of short circuits 



Acquaintance networks are expected to have a high 
degree of transitivity (Wasserman and Faust 1994), or in 
other words, a high density of triangles, since if person A 
knows person B and person C, then person B and person 
C are likely to be acquainted. We apply a commonly used 
measure that gives the fraction of triangles out of the 
connected 3-paths of the graph (a quantity that was de- 
fined for undirected graphs, but is trivially generalized 
to directed graphs, for which we use subscript "dir"). 
If we let p(n) denote the number of representations of 
paths 5 and c(n) denote the number of representations of 
circuits, of length n, then we can express the clustering 



Assuming the community outlives its members, / will 
eventually start to increase (when the number of inactive 
users slows down the accelerated growth sufficiently). 



5 A representation of a path of length three is a triplet (», v, w) such 
that (u, v) and (v, w) are arcs. In an undirected network a path have 
two representations and a triangle has six representations. 
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coefficient, 6 C, as: 



C = 



£<3) 
P(3) 



(5) 



One can expect that social networks with many het- 
erosexual romantic relationships, such as the pus- 
sokram.com networks, to have rather few triangles. 7 To 
get a better picture of the density of short circuits we also 
measure the density of circuits of length four: 



D = 



c(4) 
p(4) 



(6) 



The n-behavior of c(n)/p(n) varies from network to net- 
work, and could possibly be an informative quantity in 
it self. A very high C will in most cases probably imply 
a high D (for R = 1 network, two triangles with one arc 
in common will contribute to c(4)), but the reverse is less 
certain. 

Values for Cdir and and their undirected coun- 
terparts are shown in Table IlIi.FI We note that, with 
a few exceptions, the values for the real networks are 
significantly larger than the randomized; the difference, 
however, is far less dramatic than for the scientific collab- 
oration network. This is contrast between the Internet 
community networks and the arXiv.org data is easily ex- 
plained from the fact that a paper with n aut h > 3 authors 
represents a fully connected subgraph of G (contributing 
with n a uth("auth-l)("auth-2)/3 triangles). However, we 
would like to stress that the values themselves are not 
very informative, compared to their time dependence. 

The time development of C and D for different net- 
works is shown in Fig. [6] As a quantity dependent on 
only the local network structure the density of short cir- 
cuits is an intrinsic quantity; and, as seen for the cluster- 
ing coefficient (Barabasi, Jeong et al. 2002), these quan- 
tities approach their equilibrium values from above. In- 
terestingly, just as for the assortative mixing coefficient, 
the relaxation towards equilibrium is faster for C and D 
than for the average degree M/N; i.e. the density of short 
cycles is rather independent of the average degree. 

As can be seen in Fig. [6j most C and D curves have 
extremes in the middle of the time range (the density 
of short circuits are at their minima). The reason for 
this comes from a conflict between counteracting mech- 
anisms of different time-scales. There are three natural 



6 This quantity is sometimes called transitivity, sometimes clustering 
coefficient. Note however that is not identical to Watts and Strogatz's 
(1998) clustering coefficient (where they average a local transitivity 
measure over the vertex set). 

7 Presumably, homosexual relationships are not the common type of 
romantic relationship among Swedish adolescents. Therefore we 
expect few triangles. As a corollary, in a community populated 
largely by homosexual individuals, the number of triangles would 
be much higher. Regrettably we cannot test this hypothesis with 
available data. 



time-scales in the system: The average time between 
new registrations; the average time between new con- 
tacts for an individual user; and the average life span of 
a user in the community. The latter time-scale should 
be responsible for the long-term behavior such as the 
increase towards equilibrium of M/N. And as shorter 
circuits are more likely in a dense network, it is natural 
that C and D increase in the large t limit. The decrease 
for early times is a finite size effect that can be seen in 
evolving network models with constant average degree 
such as the Barabasi- Albert model (Barabasi and Albert 
1999; Barabasi, Albert et al. 1999; Barabasi, Jeong et al. 
2002) and extensions (Holme and Kim 2002), where the 
C and D curves converge from above. 

Another interesting aspect is that the values of C and 
D, although finite in the large t limit, is much smaller than 
in the actor- and scientific-collaboration networks. In an 
Internet community the way by which people introduce 
strangers among their acquaintances to each other (New- 
man 2001; Holme and Kim 2002) is likely not the mecha- 
nism responsible for the finite clustering (remember that 
in network models such as the Erdos-Renyi (1959) and 
Barabasi- Albert (Barabasi and Albert 1999; Barabasi, Al- 
bert et al. 1999; Barabasi, Jeong et al. 2002) models the 
clustering goes to zero as the network grows). Instead 
a finite density of short circuits can be explained by the 
tendency formulated in the proverbial like-attracts-like, 
where the similarity is defined by signaled social, psy- 
chological, and physiological traits. 8 

To further convince ourselves that the sampling time 
is large enough we also use rewiring to examine the 
time evolution of two structural measures (the assorta- 
tive mixing coefficient and the clustering coefficient for 
the undirected all-contacts network). As seen in Fig. |7| 
the rewired quantities converge in the same time scale as 
r and C, which reconfirm that the sampling time frame 
is sufficient. We note that for k > 200 days the assor- 
tative mixing coefficient is significantly lower than the 
rewired reference curve. For the same time interval the 
rewired clustering coefficient closely overlap the mea- 
sured C-value; for t > 200 days the actual value over- 
lap the mid-quartiles of the rewired data during around 
30% of the 512 days. For the initial 'non-equilibrium' 
part (t < 100 days) of the time-evolution the curves of 
the rewired and real networks diverges. In this region 
the network is rather sparse (see Fig O which explains 
the low C-values for the rewired C-curve. The high 
early values of C seems contradictory to the apparent 
absence of tendency towards triangle formation during 
latter times. This means that the contact patterns of the 



1 Another possible explanation for the convergence of C and D to finite 
values is that short circuits are introduced from the offline world 
outside the community. Reading users' guest books, however, gives 
the impression that the vast majority of community-dyads were 
strangers offline. We believe that this effect is negligible, but we 
are unfortunately unable to go beyond speculation on this point. 
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TABLE II Statistics for the fully-grown networks of pussokram.com, nioki.com and arXiv.org networks provided for comparison. 
Statistics for corresponding randomized networks are within square brackets. Double hyphens indicate missing data. Note: * 
p < 0.01. + The 'friends' and 'arXiv.org' data sets are undirected, M denotes the number of undirected edges (which is half the 
number of M in a directed representation of the graph), nioki.com and arXiv.org data are not tested for significance. 
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FIG. 6 Density of short circuits for the different networks (flirt network omitted as it contains very few 3- and 4-circuits). 



early network is no the same as later on. As it turns 
out, in the early community, a group of actors contact 
each other rather frequently (rather more like 'chatting' 
than romantic contact making) whereas another group 
makes a few contacts before quitting the community. We 
interpret this such that it requires a minimal number, or 
"critical mass" (cf . Schelling 1978) of people for the com- 



munity to function. Before the critical mass is reached, 
the users either have the community as a chat room (a 
usage with a presumably smaller critical mass) or leave 
it. 
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FIG. 7 Time evolution of original and rewired quantities, (a) 
shows data for the assortative mixing coefficient r for the undi- 
rected all-contacts network, (b) is the clustering coefficient for 
the same data. The rewired data is obtained from 100 updating 
sweeps over all links, and indicated by the upper and lower 
hinges (border values between the first and second quartile, 
and third and fourth quartile respectively). 



IV. SUMMARY AND CONCLUSIONS 

We have investigated networks of communication 
between the users of the Internet community pus- 
sokram.com. The four different means of contact at pus- 
sokram.com defines five different networks in our study 
(one for each separately and one for all taken together). 
Apart from recent studies of scientific collaboration net- 
works and movie actor networks, there are very few such 
phenomenological descriptions of large social networks, 
and thus there is limited knowledge that our findings 
can be related to. 

It is obvious that the fact that the interaction under 
study takes place on the Internet creates special condi- 
tions for communication. We believe that the interaction 
online is exposed to less structural forces than what is 
typically the case in most other social settings. For ex- 
ample, simultaneous interaction is not a prerequisite for 
communication in an Internet community, i.e. time as a 
structural force is therefore of less importance than in 
most other settings. Neither does geographical space 
constraint communication. And in addition, that social 
signifiers are less visible (compared to e.g. face-to-face 
interaction), and the relative ease with which you can 
conceal your identity and transform your appearance 
in online interaction, are factors reducing the structure 
forming forces at work in 'offline' social activity. It is 
therefore interesting to note, that despite these caveats, 
the networks under study here are much more structured 
than what would be expected in a random network. 

To summarize our findings of the Internet community 
pussokram.com, we see that: 

• The average degree converges over time, but sur- 
prisingly we observe no cut-off in the degree dis- 



tribution. Previous studies do suggest that there 
is an upper limit to the mean number of contacts 
(Marsden 1987), and on average we find this socio- 
cognitive limitation despite the fact that time and 
space is of less important here. The reason we see 
continued growth in the cumulative degree distri- 
bution might be that it's relatively costless to have 
a high turnover on ones contacts in an online com- 
munity. Contacts are established without much in- 
vestment, and can also be dropped without much 
sanctioning. 

• Reciprocity is rather low, and presumably lower 
can be expected in a regular acquaintance network. 
Reciprocity levels quickly converge to a steady 
state. 

• Most assortative mixing coefficients have small 
negative values, suggesting a pattern of dissasor- 
tative mixing. This can partly be explained by 
the conventional effect from the skewed degree se- 
quence (Newman 2002). The observed effect is sig- 
nificantly larger than can be expected solely from 
the degree distribution. An explanation for these 
higher /"-values is the particular nature of the dat- 
ing interaction (Cook, Emerson et al. 1983). We 
also find that mixing coefficients as a function of 
time converge rapidly. The dissasortative mixing 
in the Internet community networks is in striking 
contrast to the strong assortative mixing seen in 
scientific collaboration networks, and the nice cor- 
respondence with previous work in sociology in- 
dicates that Internet communities indeed strongly 
resembles off-line social communities. 

• The cumulative degree distributions are highly 
skewed, being a mixture of previous mappings 
of acquaintance networks (Amaral, Scala et al. 
2000) — for few contacts — and partnership net- 
works (Liljeros, Edling et al. 2001) — for many con- 
tacts. 

• The geodesic length initially increases as new ver- 
tices are added to the network. But as the network 
settles the increase is limited by the growing av- 
erage degree. Both Zgc aR d I' 1 shows consistently 
that the average geodesic length is decreasing dur- 
ing the whole sample period (a situation that can 
only exist for a non-equilibrium network). 

• Clustering — the density of triangles — converges 
over time to non-zero values (as opposed to com- 
pletely random networks). Still, values are proba- 
bly on a much lower level than would be expected 
in offline acquaintance networks. The explanation 
for these low values is twofold — the lack of in- 
troduction as a mechanism for tie-formation, and 
the romantic profile of pussokram.com promoting 
romantic contacts. The latter aspect is also mani- 
fested in that the density of 4-circuits is larger than 
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the density of triangles for the pussokram.com net- 
works. Once again, the Internet community net- 
works are different from the scientific collaboration 
network where clustering is larger than the density 
of 4-circuits. 

An Internet community such as pussokram.com defines 
a structured social network that share more of the struc- 
turing forces with general acquaintance networks than 
networks of professional collaborations do. We believe 
that the precise timing resolution and fast dynamics (giv- 
ing a wide effective sampling time-frame) will make In- 
ternet communities an invaluable object for future social 
networks studies of the largest scale. 
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